Correlation of Data Reconstruction Error and Shrinkages in Pair-wise Distances under Principal Component Analysis (PCA)

نویسنده

  • Abdulrahman Oladipupo Ibraheem
چکیده

In this ‘on-going’ work, I explore certain theoretical and empirical implications of data transformations under the PCA. In particular, I state and prove three theorems about PCA, which I paraphrase as follows: 1). PCA without discarding eigenvector rows is injective, but looses this injectivity when eigenvector rows are discarded 2). PCA without discarding eigenvector rows preserves pair-wise distances, but tends to cause pairwise distances to shrink when eigenvector rows are discarded. 3). For any pair of points, the shrinkage in pair-wise distance is bounded above by an L1 norm reconstruction error associated with the points. Clearly, 3). suggests that there might exist some correlation between shrinkages in pair-wise distances and mean square reconstruction error which is defined as the sum of those eigenvalues associated with the discarded eigenvectors. I therefore decided to perform numerical experiments to obtain the correlation between the sum of those eigenvalues and shrinkages in pair-wise distances. In addition, I have also performed some experiments to check respectively the effect of the sum of those eigenvalues and the effect of the shrinkages on classification accuracies under the PCA map. So far, I have obtained the following results on some publicly available data from the UCI Machine Learning Repository: 1). There seems to be a strong correlation between the sum of those eigenvalues associated with discarded eigenvectors and shrinkages in pair-wise distances. 2). Neither the sum of those eigenvalues nor pair-wise distances have any strong correlations with classification accuracies. 1 ar X iv :1 41 2. 67 52 v1 [ cs .L G ] 2 1 D ec 2 01 4

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quantitative principal component model for skin chromophore mapping using multi-spectral images and spatial priors

We describe a novel reconstruction algorithm based on Principal Component Analysis (PCA) applied to multi-spectral imaging data. Using numerical phantoms, based on a two layered skin model developed previously, we found analytical expressions, which convert qualitative PCA results into quantitative blood volume and oxygenation values, assuming the epidermal thickness to be known. We also evalua...

متن کامل

Principal component analysis of CYP2C9 and CYP3A4 probe substrate/inhibitor panels.

Cytochrome P450 (P450) inhibition often occurs in a strongly substrate- and inhibitor-dependent manner, with a given inhibitor affecting the metabolism of different substrates to differing degrees and with a given substrate responding differently to different inhibitors. Traditionally, patterns of functional similarity and dissimilarity among substrates and inhibitors have been studied using cl...

متن کامل

Derivation of regression models for pan evaporation estimation

Evaporation is an essential component of hydrological cycle. Several meteorologicalfactors play role in the amount of pan evaporation. These factors are often related to eachother. In this study, a multiple linear regression (MLR) in conjunction with PrincipalComponent Analysis (PCA) was used for modeling of pan evaporation. After thestandardization of the variables, independent components were...

متن کامل

Non-Greedy L21-Norm Maximization for Principal Component Analysis

Principal Component Analysis (PCA) is one of the most important unsupervised methods to handle highdimensional data. However, due to the high computational complexity of its eigen decomposition solution, it hard to apply PCA to the large-scale data with high dimensionality. Meanwhile, the squared L2-norm based objective makes it sensitive to data outliers. In recent research, the L1-norm maximi...

متن کامل

Representing Spectral data using LabPQR color space in comparison to PCA method

In many applications of color technology such as spectral color reproduction it is of interest to represent the spectral data with lower dimensions than spectral space’s dimensions. It is more than half of a century that Principal Component Analysis PCA method has been applied to find the number of independent basis vectors of spectral dataset and representing spectral reflectance with lower di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1412.6752  شماره 

صفحات  -

تاریخ انتشار 2014